Histogram analysis in the differentiation between adrenal adenomas and pheochromocytomas: the value of a single measurement

Objective To assess the diagnostic accuracy of histogram analysis on unenhanced computed tomography (CT) for differentiating between adrenal adenomas and pheochromocytomas (PCCs). Materials and Methods We retrospectively identified patients with proven PCCs who had undergone CT examinations between January 2009 and July 2019 at one of two institutions. For each PCC, we selected one or two adenomas diagnosed within two weeks of the date of diagnosis of the PCC. For each lesion, two readers scored the size, determined the mean attenuation, and generated a voxel histogram. The 10th percentile (P10) was obtained from the conventional histogram analysis, as well as being calculated with the following formula: P10 = mean attenuation – (1.282 × standard deviation). The mean attenuation threshold, histogram analysis (observed) P10, and calculated P10 (calcP10) were compared in terms of their diagnostic accuracy. Results We included 52 adenomas and 29 PCCs. The sensitivity, specificity, and accuracy of the mean attenuation threshold were 75.0%, 100.0%, and 82.5%, respectively, for reader 1, whereas they were 71.5%, 100.0%, and 81.5%, respectively, for reader 2. The sensitivity, specificity, and accuracy of the observed P10 and calcP10 were equal for both readers: 90.4%, 96.5%, and 92.6%, respectively, for reader 1; and 92.3%, 93.1%, and 92.6%, respectively, for reader 2. The increase in sensitivity was significant for both readers (p = 0.009 and p = 0.005, respectively). Conclusion For differentiating between adenomas and PCCs, the histogram analysis (observed P10 and calcP10) appears to outperform the mean attenuation threshold as a diagnostic criterion.


INTRODUCTION
Adrenal incidentalomas (AIs) are a common finding on cross-sectional abdominal imaging, with a prevalence ranging from 1.0% to 8.7%, depending on the age of the individual; that is, increasing with advancing age (1,2) . It is estimated that pheochromocytomas (PCCs) account for 3-7% of all AIs (3) . Conversely, although clinically silent PCCs were previously assumed to be uncommon, some recent studies have estimated that up to 30% of PCCs are diagnosed as an AI (4,5) . A PCC is usually diagnosed on the basis of typical symptoms, such as headache, tachycardia, facial flushing, and sweating (4) ; elevated blood pressure; and elevated plasma levels of metanephrines-the O-methylated metabolites of catecholamines (6,7) . The knowledge that a substantial proportion of patients with a PCC are either asymptomatic or mildly symptomatic is concerning because these lesions are associated with an increased risk of severe cardiovascular complications related to the excessive production of catecholamines (8,9) . In this scenario, the search for reliable criteria to differentiate between adrenal adenomas (AAs) and PCCs when an AI is found has been the focus of numerous studies in the literature (10) . Some studies (7,11) , including a multi-institutional analysis (11) , have indicated that an AI with a mean attenuation ≤ 10 HU on unenhanced computed tomography (CT) scans may not require biochemical screening to exclude a PCC, although endocrinology societies continue to recommend such screening (12) . However, it is also well known that a significant proportion of AAs do not contain enough intracytoplasmic lipids and will therefore display a mean attenuation > 10 HU (13) ; that is, they are the socalled lipid-poor adenomas (LPAs). In that scenario, the washout technique represents an alternative diagnostic criterion (14,15) . The technique compares the mean attenuation observed on delayed-phase images (acquired 10-15 minutes after contrast injection) with that observed in the venous and unenhanced phases. It is a reliable method to distinguish AAs from metastatic lesions (16,17) , with a few exceptions (e.g., hepatocellular carcinoma and renal cell carcinoma metastases). However, the method is inaccurate for distinguishing AAs from PCCs because the latter often present with marked washout (18,19) .
In recent decades, the analysis of voxels in a region of interest (ROI) placed on an AI (histogram analysis) has emerged as an alternative to improve the characterization of LPAs (20)(21)(22)(23) . Histogram analysis suggests that any homogeneous adrenal lesion ≤ 4.0 cm with at least 10% negative voxels could be assumed to be an AA (20) . More recently, there has been renewed interest in histogram analysis because some studies have demonstrated that the 10th percentile (P10) can be calculated from the mean attenuation of the same single ROI on unenhanced CT images, as long as the voxel distribution is normal (24,25) . However, histogram analysis has been tested only for distinguishing adenomas, especially LPAs, from PCCs in a single study, conducted by Remer et al. (26) . In that study, the authors assessed the P10 by counting voxels. Accordingly, we conducted the present study to assess the diagnostic accuracy of histogram analysis, using either voxel counting or a single measurement, for differentiating between AAs and PCCs.

MATERIALS AND METHODS Study population
This study was conducted at two separate centers and was approved by the institutional committees on human research of both centers. Due to the retrospective nature of the study, the requirement for written informed consent was waived. This work used the Radiology Information System and electronic pathology databases. Two radiologists (not involved with the imaging analysis) searched for "pheochromocytomas" in both databases and retrospectively identified the cases of all patients in whom a diagnosis of PCC was proven by biopsy or surgery and confirmed by histopathology, between January 2009 and July 2019. We selected one or two adenomas diagnosed within two weeks of the date of diagnosis of each PCC, prioritizing those diagnosed closest to that date, in order to avoid any chronologic bias. The two-week interval was chosen to ensure that patients were examined in the same CT scanner.
Among the 52 adenomas selected, the diagnosis was confirmed by histopathology (regardless of the size) in four, and the remaining 48 adenomas were diagnosed on the basis of the criteria established in the American College of Radiology (ACR) White Paper on incidental adrenal masses (10) : being homogeneous; measuring ≤ 4.0 cm; and remaining unchanged in size for at least 12 months. Of those 48 lesions, 27 (56.2%) showed signal loss on opposed-phase magnetic resonance imaging. The mean follow-up period for the patients with adenomas was 37.3 ± 23.1 months (range, 13-107 months).
We applied the same exclusion criteria for AAs and PCCs: CT images unavailable; no unenhanced CT images available; suboptimal images (e.g., with extensive artifacts due to respiratory motion or metallic implants); low contrast-to-noise ratio (CNR) in the images; and thick (> 3.0 mm) slices. We retrieved the records for 64 PCCs, 35 of which were excluded: 28 because no CT images were available; two because the images were unenhanced; three because the images were suboptimal, including metallic artifacts in the adrenal area; and two because there was a low CNR due to obesity. Of the original 60 patients with AAs, six were excluded: four due to suboptimal images and two due to low CNR. Two patients had bilateral AAs. Therefore, we evaluated two groups: PCC (n = 29) and AA (n = 52). The flow chart in Figure 1 details the selection process.

Clinical, biochemical, and demographic data
Two third-year radiology residents reviewed the electronic medical records of all enrolled patients. They retrieved (or confirmed) the following data related to each patient: age, sex, endocrinological test results, and histopathology fi ndings when available.

CT protocols and image quality
The CT examinations were conducted in one of three multidetector scanners: a Brilliance 16-slice scanner (Philips, Best, The Netherlands); a Somatom Sensation 64-slice scanner (Siemens, Erlangen, Germany); or a LightSpeed VCT 64-slice scanner (General Electric, Milwaukee, WI, USA). During image acquisition, the slice thickness was 1.0 mm or 1.25 mm, the tube voltage was 120 kVp, and the tube current was variable, as defi ned by patient size and body habitus (27) . All images were post-processed and reconstructed using a standard soft tissue algorithm at a collimation of 3.0 mm with no overlap reconstruction. When used, the pitch was set at 1:1 for all examinations.
Because of the retrospective nature of the study, the CT examinations were performed using different protocols according to the indication. However, we had access to data from unenhanced CT examinations (e.g., for urinary lithiasis) as well as from multiphasic CT examinations, including images acquired in the arterial, portal, and equilibrium phases, depending on the clinical indication, as well as in the delayed phase (e.g., during urological CT).
The contrast-enhanced images were obtained in the arterial phase in accordance with the bolus tracking used in order to ensure synchronization. For the portal phase, the images were acquired at 60-70 s after the beginning of the injection. Iodinated contrast material-Visipaque 320 (General Electric) or Ultravist (Bayer Schering Pharma AG, Berlin, Germany)-was administered with a power injector, in a dose ranging from 100 mL to 120 mL (depending on the weight of the patient), at a rate of 2-4 mL/s. The image quality was assessed by a senior radiologist, who placed ROIs ≥ 1.0 cm in diameter in the adrenal gland, liver, and spleen, as well as in the retroperitoneal fat adjacent to and in the adrenal gland lesion, to measure the CNR. A CNR > 2.0 was set as the threshold for considering an image to be of good quality (28) . The CNR was calculated using the following formula (29) : CNR = (mean adrenal lesion density ∕ SD of adrenal lesion density − mean paravertebral density)

CT image analysis
Two independent readers with 4 and 5 years of experience in abdominal imaging, respectively, assessed the CT images. Both readers were blinded to the clinical, biochemical, and histopathology fi ndings. The readers subjectively evaluated all adrenal lesions, assessing homogeneity on the unenhanced and contrast-enhanced images. They tested for the presence of calcifi cations and noted the size of the lesions.
Using the OsiriX Dicom Viewer (PixMeo, Geneva, Switzerland), the readers assessed unenhanced CT images and placed an ROI in each adrenal lesion at its largest diameter. From that ROI, they determined the mean and standard deviation of the attenuation. The information for that ROI was saved in XML format. On the basis of the mean attenuation and the corresponding standard deviation, they estimated the P10 according to the following formula (30) : P10 = mean − (1.28 × standard deviation) The value obtained with that formula was defi ned as the calculated P10 (calcP10).
Histogram analysis was carried out by a radiologist with more than 20 years of experience in abdominal imaging, who obtained the data from the saved XML fi le. The radiologist gathered the information obtained in the ROI to generate a Microsoft Excel fi le containing the attenuation values from each voxel, using those values to create a histogram of HU values. The same radiologist evaluated the histogram data and counted the number of voxels that were measured to determine the observed density value of the P10. The value obtained was designated the observed P10 of the histogram analysis (hereafter referred to as the observed P10).
For the estimation of diagnostic accuracy in this study, a lesion was assumed to be an AA when the following criteria were met: a mean attenuation < 10 HU on unenhanced   (10) ; and an observed P10 or calcP10 < 0 HU. In clinical practice, two additional criteria can be combined with any other criterion from CT or magnetic resonance imaging (10,31) : the lesion should be homogeneous and should measure ≤ 4.0 cm. Here, we assessed diagnostic accuracy prior to applying those two criteria.

Standard of reference
For the PCCs, the reference standards were the histopathological report and the biochemical test results. Surgical and histopathological confirmation was available for only four of the AAs. For the remaining 48 AAs, the reference standard was no variation in size during a follow-up period of at least one year, as well as measuring ≤ 4.0 cm and being homogeneous, in accordance with the previously mentioned recommendations of the ACR white paper on incidental adrenal masses (10) . Those criteria were verified by a third observer, a radiologist with more than 20 years of experience in abdominal imaging, before imaging analysis by the two readers.

Statistical analysis
The Stata statistical software package, version 15 (StataCorp LP, College Station, TX, USA) was used for all analyses. The level of statistical significance was set at p > 0.05. The Shapiro-Wilk statistical test was employed to determine which continuous variables had a normal distribution including the histogram derived from the voxels. For normally distributed variables, Student's t-test was used in order to compare the means between the two groups. The Mann-Whitney U test was employed to compare all other variables. The chi-square test was used in order to compare categorical variables. The diagnostic accuracies of the mean attenuation (using a cutoff value of 10 HU), the  observed P10, and the calcP10 were compared by McNemar's test. The interobserver agreement for density patterns (homogeneous or heterogeneous) and for the presence of calcifications was assessed with intraclass correlation coefficients via calculation of Cohen's kappa (κ).

RESULTS
The demographic data are shown in Table 1. The mean age of the patients was significantly lower in the PCC group than in the AA group (p = 0.001). There was no significant difference between the two groups for patient sex or lesion laterality. The measurements obtained by reader 1 indicated that 13 (25.0%) of the 52 AAs were LPAs (mean attenuation > 10 HU), whereas those obtained by reader 2 indicated that 16 (30.7%) were LPAs.
When the imaging parameters on CT images were assessed (Table 2), the PCCs were found to be significantly larger than the AAs. On their longest axis, the PCCs were, on average, twice as large as the AAs (2.53 vs. 5.32 cm for   reader 1 and 2.42 vs. 5.03 cm for reader 2; p < 0.0001 for both). Only three of the AAs (5.8%) were > 4.0 cm, whereas 12 of the PCCs (41.4%) were ≤ 4.0 cm. The mean attenuation value on unenhanced images was also signifi cantly different between AAs and PCCs (4.84 vs. 36.58 HU for reader 1 and 5.83 vs. 36.63 HU for reader 2; p < 0.0001 for both). Similarly, the difference between AAs and PCCs for the observed P10 was also signifi cant for both readers (p < 0.0001). The difference was also signifi cant for the calcP10 (p = 0.01 for reader 1 and p = 0.001 for reader 2).
The AAs were homogeneous in 43 (82.7%) cases for reader 1 and in 42 (80.8%) for reader 2 (Figure 2). For both readers, the PCCs were homogeneous in only seven (24.1%) of the 29 cases. The difference in this distribution was statistically signifi cant (p < 0.0001 for both readers). The interobserver agreement for homogeneity was excellent (κ = 0.87; p = 0.00001). Among the AAs, both readers identifi ed calcifi cations in just one lesion (1.9%). Among the PCCs, readers 1 and 2 identifi ed calcifi cations in fi ve lesions (17.2%) and four lesions (13.8%), respectively. The difference between the two groups in the frequency of calcifi cations was statistically signifi cant (p = 0.01 for reader 1 and p = 0.03 for reader 2). The interobserver agreement for calcifi cation was also excellent (κ = 0.82; p = 0.00001).
The distribution of the mean attenuation values of all voxels was Gaussian in 87.6% of the patients for reader 1 and in 85.1% for reader 2. There was no signifi cant difference in that proportion between the AA and PCC groups: 86.5% of AAs for both readers; and 89.7% and 82.8% of PCCs for readers 1 and 2, respectively. This high frequency of Gaussian distribution is relevant because it is a requirement for using the formula to calculate the P10 via the mean and standard deviation. The correlation between the observed P10 and the calcP10 was strong for both readers (Figure 3): r = 0.9981 (p < 0.0001) for reader 1 and r = 0.9975 (p < 0.001) for reader 2.
When assessing the diagnostic criteria for the mean attenuation criterion, observed P10, and calcP10 (Table  3), we observed that the sensitivity and accuracy for diagnosing AAs were higher for the observed P10 and calcP10 than for the mean attenuation criterion. For reader 1, the sensitivity of the 10 HU mean attenuation cutoff value was 75.0% (95% CI: 61.0-85.9), whereas it was 90.4% (95% CI: 79.0-96.8) for the observed P10 and the calcP10 (p = 0.009). For reader 2, those values were 71.1% (95% CI: 52.9-86.9) and 92.3% (95% CI: 81.5-97.9), respectively (p = 0.0005). For both readers, the specifi city of the mean attenuation criterion was 100% (95% CI: 88.0-100.0). Among the 29 PCCs evaluated, readers 1 and 2 found that one (3.4%) and two (6.9%), respectively, showed more than 10% negative voxels. The specifi city of the observed P10 and calcP10 was the same for reader 1-96.5% (95% CI: 82.2-99.9)-because one small (1.5-cm) PCC had a cystic portion (Figure 4). Outside of that cystic area, the lesion showed more than 10% negative voxels. For reader 2, the specifi city of the observed P10 and calcP10 was also the same-93.1% (95% CI: 77.2-99.1). In addition to the small PCC mentioned above, a large (5.4-cm) PCC containing a large cystic area showed 15.4% negative voxels. The higher diagnostic accuracy was signifi cant for reader 1 and for reader 2 (p = 0.05 and p = 0.03, respectively). In the AA group, 18 (34.6%) of the 52 lesions were LPAs Figure 2. A 53-year-old female patient. A: Axial contrast-enhanced CT scan, in the venous phase, showing a right-sided homogeneous adrenal lesion measuring 2.8 cm (asterisk). B: Axial unenhanced CT scan at the same level with an ROI drawn at the center of the lesion, showing a mean attenuation of 24.7 HU, which is suggestive of an LPA (although not meeting the criteria at this point). However, the histogram analysis P10 and calcP10 showed that there was more than 10% negative voxels, further suggesting a diagnosis of LPA. At this writing, the patient is asymptomatic and the lesion has been stable for 62 months.
A B Figure 3. Scatter plot showing a strong positive correlation for the P10 observed on a histogram analysis (y-axis) and that calculated from a single ROI on an unenhanced CT scan (x-axis), by reader 1 (A) and reader 2 (B).
A B Table 3-Diagnostic accuracy of the mean attenuation criterion, observed P10, and calcP10.  . B: Axial unenhanced CT scan at the same level with an ROI drawn to avoid the cystic portion. The mean attenuation was 12.5 HU. The histogram analysis P10 and calcP10 showed more than 10% negative voxels. Aside from the clinical and biochemical setting, this could be assumed to be an AA, based on histogram analysis and calcP10 criteria, were it not for the heterogeneity of the lesion.

A B
(mean attenuation > 10 HU) and all of them were correctly assigned by the observed P10 and calcP10, indicating that a subgroup analysis would show higher sensitivity. The lower specificity was not significant for reader 1 (p = 0.09), although it was for reader 2 (p = 0.02). It is noteworthy that the two PCCs showing more than 10% negative voxels were heterogeneous with cystic areas and that the larger one was > 4.0 cm.

DISCUSSION
Our data indicate that histogram analysis, using either voxel counting or the P10 formula, can help differentiate between an LPA and a PCC on unenhanced images without sacrificing specificity. This criterion, if confirmed in future studies, could be a powerful adjuvant technique for assessing AIs on CT images.
In 1998, Korobkin et al. (14) and Szolar and Kammerhuber (15) reported on the use of washout to characterize LPAs. That criterion was initially assumed to be effective for the differentiation between PCCs and AAs. However, Happel and Heinz-Peer (19) subsequently showed that PCCs could have washouts in the same range as those of AAs. That claim was confirmed in other, similar studies (32,33) . Patel et al. (34) showed that 33% of PCCs had either absolute or relative washout values meeting the cutoff criteria for AAs, as well as that half of those PCCs were homogeneous in the four phases studied: unenhanced, arterial, venous, and equilibrium. Of greater relevance in that study, all of the PCCs had a mean attenuation on unenhanced images above the 10 HU cutoff.
Recently, Canu et al. (3) and Sane et al. (7) reported that risk of being a PCC is negligible for homogeneous lesions measuring ≤ 4.0 cm and showing a mean attenuation ≤ 10 HU; international guidelines then recommended that no further endocrinological screening be performed for lipidrich adenomas (11,12) . These studies were important because a previous study, from the early 2000s, reported that some adrenal lesions with a mean attenuation ≤ 10 HU were found to be PCCs (35) . Our findings are in keeping with those of the more recent studies: none of the PCCs in our sample had a mean attenuation ≤ 10 HU.
With the emergence of voxel counting, histogram analysis has been established using the rule of 10% negative voxels for characterizing an adenoma (20) . Several studies have confirmed the increased diagnostic accuracy when compared to the mean attenuation on unenhanced images (21-24) . Hsu et al. (24) and Rocha et al. (25) recently introduced a simpler, faster method of applying the criterion of 10% negative voxels, without voxel counting, based on a statistical formula to estimate the P10 of any Gaussian distribution. To date, only Remer et al. (26) assessed the P10 criterion for distinguishing AAs from PCCs. However, those authors compared an AA group to a non-AA group that included metastases and PCCs, without reporting an isolated comparison between AAs and PCCs. They reported that the sensitivity of histogram analysis was not superior to that of the mean attenuation, the former being 69.5% and 72.4%, respectively, for two readers. They also found no significant difference between the mean attenuation and the P10 criterion in terms of the overall specificity, and they did not report the specificity for PCCs alone.
Of the 29 PCCs evaluated in the present study, one (3.4%) and two (6.9%) were found to show more than 10% negative voxels by readers 1 and 2, respectively. Therefore, the specificity of observed P10 and calcP10 was 96.0% for reader 1 and 93.1% for reader 2. However, in our sample, in contrast to what was found by Remer et al. (26) , the sensitivity of the observed P10 and calcP10 was found to be significantly greater than that of the mean attenuation criterion, for both readers (p = 0.009 and p = 0.0005 for readers 1 and 2, respectively), as was the overall accuracy of the observed P10 and calcP10. In the two cases of PCCs with more than 10% negative voxels, both had cystic areas. One possible explanation for this increased proportion of negative voxels is that there could be microscopic cystic areas that were not perceived when the ROIs were positioned. In both cases, the marked heterogeneity of the lesions would have prevented a presumptive diagnosis of AA. In our sample, the PCCs were significantly larger than the AAs, which could have been due to the fact that biopsy/surgical confirmation was an inclusion criterion, which unquestionably created a selection bias. Larger PCCs tend to be heterogeneous and are more likely to be symptomatic, therefore being more likely to be biopsied or surgically resected.
Another important parameter when assessing AIs is size. There is a consensus that surgical resection is indicated for adrenal masses measuring > 4.0 cm (excepting cysts and myelolipomas), regardless of any imaging criteria or functional status in the endocrinological evaluation (36) . In our study, 41.4% of the PCCs measured ≤ 4.0 cm on their longest axis. Only one of those PCCs had more than 10% negative voxels. That lesion was heterogeneous, with a well-defined cystic portion ( Figure 4).
Our study has some limitations. First, it was a retrospective study, with an increased risk of biases, especially selection bias, as cited for the PCCs. We aimed to minimize the risk by using appropriate inclusion and exclusion criteria. Second, the sample was small, which reflects the relatively low incidence of PCCs. Although AAs are common lesions, we included a small portion of them (at a ratio of 2:1) to avoid inducing significant discrepancy between the two groups, which could be up to 50 times, considering the high prevalence of AAs. Third, for the group of AAs, we used predominantly follow-up, instead of histopathological analysis, as the reference standard. However, surgical approaches (or even biopsy) are not indicated in major international guidelines for small, nonfunctioning adrenal lesions. The great majority of AAs should simply be monitored for confirmation. In addition, we applied the same criteria mentioned by experts in the ACR White Paper (10) , and those criteria have been used in several previous studies (15)(16)(17)(18)(20)(21)(22)(23) .
In conclusion, our data indicate that histogram analysis using the estimation of the P10 of voxels from a single measurement can be used for differentiating between nonfunctioning AAs and PCCs. This is important because it increases the value of histogram analysis for the characterization of LPAs.